-
-
Notifications
You must be signed in to change notification settings - Fork 35.9k
Introduction of parameterizable workgroups for compute shaders and dispatch sizes for the renderer #31402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
📦 Bundle sizeFull ESM build, minified and gzipped.
🌳 Bundle size after tree-shakingMinimal build including a renderer, camera, empty scene, and dependencies.
|
…ecification of the dispatchSize
…ecification of the dispatchSize
…ecification of the dispatchSize
73116e1
to
910ce62
Compare
src/nodes/gpgpu/ComputeNode.js
Outdated
let count = countOrWorkgroupSize; | ||
|
||
if ( Array.isArray( countOrWorkgroupSize ) ) { | ||
|
||
workgroupSize = countOrWorkgroupSize; | ||
count = null; | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the project agrees with the API change I'm wondering if we should deprecate the old code path where "count" is passed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've set it up so that count is used if no workgroup is specified, so that existing user code and examples continue to run as before. But yes, I would also be in favor of removing the count from the code in the long term, with the usual transition time. This also significantly simplifies the compute call in the backend.
const isValid = dispatchSize[ 0 ] > 0 && dispatchSize[ 1 ] > 0 && dispatchSize[ 2 ] > 0; | ||
|
||
dispatchSize = isValid ? dispatchSize : computeNodeData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's valid to set @workgroup_size to 1 or 2 values when defining the parameters inline with the remaining dimensions implicitly being set to "1" (see the spec here) so it would be nice to allow for providing workgroup size of 1 or 2 to align with the native behavior and for the sake of ergonomics:
const computeShader = wgslFn( `fn fragmentShader() -> void {}` );
// all are equivalant
const kernel = computeShader().compute( [ 16 ] );
const kernel = computeShader().compute( [ 16, 1 ] );
const kernel = computeShader().compute( [ 16, 1, 1 ] );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that from the W3C documentation, I'll take a look at it, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be implemented in two ways. Since it makes no difference in performance, I prefer to check the length of the workgroupSize array before calling the wgsl code builder to fill it with ones if smaller than a three-valued array
this.computeShader = this._getWGSLComputeCode( shadersData.compute, ( this.object.workgroupSize || [ 64 ] ).join( ', ' ) ); | ||
const workgroupSize = this.object.workgroupSize || [ 64, 1, 1 ]; | ||
|
||
if ( workgroupSize.length !== 3 ) throw new Error( 'workgroupSize must have 3 elements' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise dispatchWorkgroups can also take 1 or 2 parameters with the remainder defaulting to 1:
// all equivelant
renderer.computeAsync( kernel, [ 10 ] );
renderer.computeAsync( kernel, [ 10, 1 ] );
renderer.computeAsync( kernel, [ 10, 1, 1 ] );
I'm curious to hear other opinions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's simple. I can do it tonight ( CET ). If you have any other requests by then, I can include them as well
src/renderers/common/Renderer.js
Outdated
@@ -2351,6 +2352,38 @@ class Renderer { | |||
|
|||
} | |||
|
|||
if ( dispatchSize !== null ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave this just to compute()
node, so we avoid duplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you mean move the entire validation to the compute() and just pass the dispatchSize through in the Renderer like before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's better to just use the workgroupSize from the compute() node itself. I would also suggest reverting to the default value [ 64 ], and removing support for non-array numbers. Then we can add these details in the js-doc about auto-fill by 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting to play with the values. I like to use the webgpu_compute_particles_rain example
//usual
184: computeParticles = computeUpdate().compute( maxParticleCount );
355: renderer.compute( computeParticles );
//with workgroups and dispatchSize
184: computeParticles = computeUpdate().compute( [ 64 ] );
355: renderer.compute( computeParticles, [ 700 ] );
184: computeParticles = computeUpdate().compute( [ 16, 16 ] );
355: renderer.compute( computeParticles, [ 10, 10 ] );
or with too low values so that not everything is computed
184: computeParticles = computeUpdate().compute( [ 64 ] );
355: renderer.compute( computeParticles, [ 100 ] );
184: computeParticles = computeUpdate().compute( [ 16, 16 ] );
355: renderer.compute( computeParticles, [ 10, 5 ] );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user could already do this creating two nodes:
const comp = computeUpdate();
computeParticles1 = comp.compute( [ 64 ] );
computeParticles2 = comp.compute( [ 700 ] );
renderer.compute( computeParticles1 );
renderer.compute( computeParticles2 );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a possible case, but it's not the only one, and I think we'll move further away from render.compute()
as we improve TSL. Related: #30768 (comment)
I didn't realize you had changed the function signature regarding dispatch. I would like to suggest a change to the design of this PR.
ComputeNode Function signature compute( dispatchs, workdgroups )
// compute( dispatch, workdgroups )
// compute( numberOrArray, numberOrArray )
nodeFn().compute( [ 1000 ], [ 64 ] )
// or
nodeFn().compute( 1000, 64 )
// or
nodeFn().compute( 1000, [ 64, 1 ] )
// etc
dispatchs
or workdgroups
can accept number or array as input, both will pass through a function that will resolve the number or array, returning an array of 3 values. This function is shared can be inside ComputeUtils.js
.
We don't use thrown
but console.error()
and try not to crash the renderer completely.
We can go like this too, I agree with you and I think it can be useful
renderer.compute( computeParticles, 700 );
// or
renderer.compute( computeParticles, [ 700 ] );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can go like this too, I agree with you and I think it can be useful
// pass work group size on kernel creation nodeFn().compute( [ 64 ] ); // pass dispatch count on evaluation renderer.compute( computeParticles, [ 700 ] );
To chime in - my preference is to pass the "dispatch count" separate from work group size when evaluating the compute kernel (whether that the be renderer.compute or whatever future incarnation of the compute API).
Perhaps it's because I'm familiar with the underlying APIs here but my impression is that when calling nodeFn().compute( workGroupSize )
the compute shader must be recreated (or dirty checked somehow) since the work group size values must be inlined in the shader string, which shouldn't be needed when changing the dispatch count. This means for a number of use cases you'd have to rebuild the compute node whenever the screen resizes, which seems wasteful. All to say I think the separation of operations feels valuable and mirrors the WebGPU API.
And thanks to both of you for the work here! The compute API is shaping up quite nicely, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for participating too @gkjohnson :)
My concern with workGroupSize
be the first parameter like nodeFn().compute( workGroupSize )
is that the compute()
is a node, and can be used inside the material, being processed only when the object/material is rendered like this example. We have the same countOfPoints
equivalent to the number of instances(a good idea would be to automate based on the number of instances if no value is added), but this may not be the general case. The user could still overwrite the number of dispatches using renderer.compute( computeParticles, 700 )
and maybe using computeParticles.setCount()
.
In short, compute()
can be used without the renderer.compute()
call if the user binds to a NodeMaterial
, so it needs a number of dispatches.
workGroupSize
is also quite advanced for most and I hope we can automate it based on dispatchSize and nodes of Fn
call if the user hasn't defined it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also extend it without changing the current functionality like nodeFn().computeInstance( workGroupSize )
.
// ComputeNode.js
export const computeInstance = ( workgroupSize ) => compute( null, workgroupSize );
//
const computeParticles = nodeFn().computeInstance( workGroupSize );
renderer.compute( computeParticles, [ 700 ] );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize you had changed the function signature regarding dispatch. I would like to suggest a change to the design of this PR.
Ah, that explains my confusion. It's important to be able to change the dispatchSize with each render call. I'm also thinking about dispatchWorkgroupsIndirect, which someone has already requested. I had even started a PR for this but did not continue because I knew that it would immediately lead to a new issue if the person found that the workgroups were not parameterizable.
Once defined, the workgroups are fixed. However, the dispatchSize can be changed in any render call, which I find very valuable. This is important for dynamic computes and also for the mentioned case of dispatchWorkgroupsIndirect which I already have in mind to do,
or just in case that @gkjohnson mentioned someone scales the renderscreen.
const comp = computeUpdate();
//workgroups, the computeNode is built ( riggid part )
computeParticles = comp.compute( [ 16, 16 ] );
//The existing computeNode can be rendered with different dispatchSize at any time ( dynamic part )
renderer.compute( computeParticles, [ 10, 10 ] );
Your objection to saving code gave me an idea. The validation conditions for the workgroupSize and the dispatchSize are the same or almost the same. If necessary, a flag can be used to differentiate between them. The validation conditions don't have to be repeated independently in the computeNode and the backend. A validation helper function that we simply call for the workgroupSize and in the backend for the dispatchSize could makes sense. What do you think about?
Related issue: #31393
So far, the workgroup for the compute shader consists of a scalar and the dispatchSize always uses dispatch.x up to
device.limits.maxComputeWorkgroupsPerDimension
before dispatch.y is used. The default workgroup is now [ 64, 1, 1 ] instead of [ 64 ]. This ensures that all previous examples are treated as before.Example on webgpu_compute_particles_rain, usual usage:
Use with workgroup and dispatchSize
Since the instanceCount in the example is 25,000, workgroup = [ 16, 16, 1 ] and dispatchSize = [ 10, 10, 1 ] = 16 * 10 * 16 * 10 * 1 = 25,600 threads, enough to cover everything. If you choose smaller values for dispatchSize, e.g., [ 9, 9, 1 ], you would see that not all raindrops are computed.
You now have full control over the thread distribution for the GPU, which was not possible.
I'm setting the PR to draft for now, even though it works, because I might think of a few more things I'd like to add. Maybe warnings if you enter something incorrectly.